Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: updating SafeToRetry() function to retry on wrapped errors #141

Merged
merged 1 commit into from
May 9, 2024
Merged

fix: updating SafeToRetry() function to retry on wrapped errors #141

merged 1 commit into from
May 9, 2024

Conversation

tjasko
Copy link
Contributor

@tjasko tjasko commented Apr 24, 2024

There is a rather unique bug when a single connection is being slammed with queries. When statement caching is enabled, and the deallocateInvalidatedCachedStatements() method is called to clean up, the Pipeline.Sync() method has the ability to error out due to the connection being in-use.

connLockError already implements the inexplicit SafeToRetry() bool() interface, which is then checked by the SafeToRetry() function. In the event the connection is locked due to being in use, the deallocateInvalidatedCachedStatements() method can return a wrapped error that looks like:

failed to deallocate cached statement(s): conn busy

When the stdlib PGX wrapper is used, it is possible for this to not be retried, as it is also taking advantage of this SafeToRetry() function. In order to correct this and prevent future similar errors, the SafeToRetry() function has been updated to check against the wrapped errors as well.

If approved, the change should be made in PGX as well:
https://github.com/jackc/pgx/blob/v5.5.5/pgconn/errors.go#L13-L19

There is a rather unique bug when a single connection is being slammed
with queries. When statement caching is enabled, and the
`deallocateInvalidatedCachedStatements()` method is called to clean up,
the `Pipeline.Sync()` method has the ability to error out due to the
connection being in-use.

`connLockError` already implements the inexplicit `SafeToRetry() bool()`
interface, which is then checked by the `SafeToRetry()` function. In the
event the connection is locked due to being in use, the
`deallocateInvalidatedCachedStatements()` method can return a wrapped
error that looks like:
```
failed to deallocate cached statement(s): conn busy
```

When the `stdlib` PGX wrapper is used, it is possible for this to not
be retried, as it is also taking advantage of this `SafeToRetry()`
function. In order to correct this and prevent future similar errors,
the `SafeToRetry()` function has been updated to check against the
wrapped errors as well.
@jackc
Copy link
Owner

jackc commented May 8, 2024

Regarding failed to deallocate cached statement(s): conn busy:

conn busy should never occur unless there is a bug in the application code or pgx itself. If this is a pgx bug, then that should be fixed instead of essentially papering over the underlying issue.

As far as the actual change goes... Hmmm... Errors are considered unsafe to retry if they do not define a SafeToRetry method. Considering an unsafe error safe because it wraps a safe error seems a little strange. But on the other hand it's difficult to think of a case where that wouldn't be correct. Maybe something with batches or pipeline mode. 🤔

@tjasko
Copy link
Contributor Author

tjasko commented May 8, 2024

You know what Jack, I was writing up a long response, but after some rubber duck debugging myself, I realized there was an issue within the application where an existing mutex wasn't blocking all read/write DB operations. That's probably the cause of how I was running into this, derp.

In regard to this change, I'll defer to whatever you think is best. To be honest, I did find it a little odd that this was the only place (that I could find) where a retriable error was being wrapped. Per standard Golang conventions, one would typically assume an error wrapped with a SafeToRetry() method is retriable (hence for the reason of this PR), but if that's not the case we can just close out this PR. I would be a little worried about future contributors wrapping this error, and accidentally introducing a bug when the error should be retired... so perhaps that's a valid reason to merge this in. On the other hand, this could be a documentation problem and we'd just need to explicitly document that these errors shouldn't be wrapped.

Thanks for the assistance here!

@jackc
Copy link
Owner

jackc commented May 9, 2024

On further thought, I think this is a correct change.

@jackc jackc merged commit 4e2e7a0 into jackc:master May 9, 2024
12 checks passed
@jackc
Copy link
Owner

jackc commented May 9, 2024

I've ported the change to v5 as well.

@tjasko tjasko deleted the fix/retry-with-wrapped-errs branch May 9, 2024 23:03
@tjasko
Copy link
Contributor Author

tjasko commented May 9, 2024

Nice, thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants